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Agriculture serves as the backbone of many countries. It provides food and 
other essential materials as per our requirement. Various kinds of diseases 
are affecting the agricultural crops which in turn reduce the quantity and 
quality of the agricultural sector. This can also lead to the decrease in food 
production thereby affecting the economic growth and development. Even 


though the symptoms and other impacts of the diseases are outwardly 
visible, manual identification of diseases and rectification is a tedious and 
Keywords: time-consuming process. Therefore, detecting the diseases using an 
automatic computer-based model will be an effective solution. Image 
processing methods in conjunction with machine learning algorithms 
provide greater assistance in the field of plant disease detection. In the 
proposed work, plant leaf images of 10 crops are collected as the dataset. 
Preprocessing The images after acquisition are preprocessed using brightness preserving 
dynamic fuzzy histogram equalization (BPDFHE), an advanced version of 
histogram equalization and Gaussian filtering. The results are calculated and 
compared using the parameters such as peak signal to noise ratio (PSNR), 
structural similarity index (SSIM) and mean square error (MSE). This 
method performs more accurately than the existing preprocessing 
approaches. 
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1. INTRODUCTION 

Agriculture is a very important sector as it helps us in various forms. There are many nations that 
depend upon agriculture for their daily needs. From food substances to energy products, agriculture has many 
roles in our lives. Still there are many obstacles encountered by the agricultural sector which affects the 
production, quantity and quality of the crops. The frequent problems are the diseases caused by the pests and 
other insects, and climatic changes. The traditional method that is followed to identify and eradicate the 
diseases is the naked eye observation technique. But this method cannot be relayed when it comes to large 
scale farming. The naked eye observation technique is a time consuming process and the accuracy of the 
disease detection made through this process cannot be trusted. Also the farmers must be educated and 
expertise in order to detect the diseases through this naked eye test. 

An alternative to this is a computer based automatic disease detection system which can be achieved 
through the collection of respective datasets and then performing various techniques on them to classify 
according to the diseases. This has been a turning point in the history of disease detection and till date there 
are many studies going on based on automatic plant disease identification which has lead to many remarkable 
developments using technologies. The crop cultivation patterns changed drastically as a result of these 
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studies. This also helped the farmers to gain knowledge and the experts to perform their work more easily 
and accurately. All these progresses had improved the overall agricultural production and thereby expanding 
the economic gain. But still there are many areas that couldn’t make much improvement such as disease 
detection, pest detection, and effects of climatic changes, because the variety of diseases and pests affecting 
the plants are evolving alongside us. Completely eliminating them is an impossible mission; instead we can 
only control them to a certain level and this can be accomplished using the automated systems. Performance 
and accuracy of the systems are the areas which should be focused so as to get better outputs. 

The given Figure | shows the general setup of an automated system. This is a step by step process 
and it starts with data acquisition. Based on the requirement the dataset has to be collected and cleaned to free 
from noise and other distortions. This process is known as preprocessing and there are various methods to 
preprocess the dataset. Here we have used brightness preserving dynamic fuzzy histogram equalization 
(BPDFHE), an advanced histogram equalization technique and Gaussian filtering to preprocess the image 
dataset. Next, some features or properties of the preprocessed image are extracted to train the system. The 
output can be obtained using classifiers such as support vector machine (SVM), k-nearest neighbors (kKNN), 
neural networks, and convolutional neural networks (CNN). In this work an advanced version of the 
conventional histogram equalization, BPDFHE is followed and also we have compared it with the traditional 
histogram equalization technique. 


Preprocessing 
Image Acquisition (Resizing, Filtering Techniques, 
Contrast Enhancement) 


g 


Image Segmentation 
(Watershed, Edge 
Detection) 


Feature Extraction 
Training set | Testing set 


Classification (CNN, 
ANN, SVM, KNN) 


Performance measures 


Figure 1. General block diagram of an automated plant disease detection system 


2. LITERATURE REVIEW 

Large varieties of automated crop disease detection models have been developed and are available in 
the market. In this section we can see some commonly used pre processing techniques and their merits over 
the others. Alessandrini et al. [1] have collected the images of healthy and unhealthy grapevine leaves under 
real conditions to detect and classify Esca disease in vineyards. Also they have discussed in detail about the 
devices that were used to collect the datasets. The initial stages that includes dataset acquisition and 
preprocessing has to be carefully handled as it is provides the base to further processing. A large amount of 
data is lost due to the noise and blur of the devices. In order to rectify this issue Latif et al. [2] has proposed a 
non- blurring technique to improve the efficiency of the input images. They have also discussed a hybrid 
method to remove all kinds of noises in the preprocessing stage. In this paper, they have used an openCL 
parallel programming language and a heterogeneous XU4 system. Kumar and Kumar [3], they have taken 
tomato leaves and the various types of diseases affecting it. As a part of preprocessing they have removed 
noises, de-blurred, compressed and resized the image. CNN architecture is used in the classification section. 

A hybrid algorithm is proposed to enhance the contrast, preprocess and segment the image for 
accurate classification [4]. They have also used a low pass filter, Gaussian filter to remove the noise by 
blurring the image so as to identify the rice varieties. Karuppusamy [5] suggests CNN architecture with 2 
additional novel layers for the development of any detection system. Here two kinds of modifications are 
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done to the CNN. Firstly they have used Euler methodology for feature vector transformation and secondly 
they have combined both the raw and normalized features for the better performance of the system. There are 
many fungal diseases that affect the plants [6]. One such fungal disease that is commonly observed on 
vegetative crops is Fusariumwilt. In this paper they have taken the tomato leaves as input data and followed a 
2-factor identification method for better accuracy. Tripathi [7] classifies the fruits using the latest 
architectures from the features with which they have trained the system. They have used resizing and data 
augmentation techniques for preprocessing. With the help of convolutional and pooling layers they have 
extracted the features and classified using many classifiers. The performance analysis shows that DenseNet 
outperforms all the other classifiers with better performance and accuracy. Ferentinos [8] has taken a variety of 
38 classes of vegetative crops and have trained using many classifiers. Out of all visual geometry group (VGG) 
classifier gave the best result with an accuracy of 99.53%. Hamdani et al. [9] discusses the diseases affecting the 
oil palm leaves and built a model with the neural networks. The features to test and train the system were 
extracted using the principal component analysis (PCA). Here they have taken 300 leaf images and the model 
gives a higher performance than the existing systems. 

Dayang and Meli [10] proposed a method to test and compare the segmentation techniques such as 
k-nearest, Canny edge detection, k-mean clustering. As a region of study they have taken corn, tomato and 
potato leaves. The experimental results show that k-nearest algorithm performs well when compared to the 
other two algorithms. A detection system to identify three commonly occurring rice leaf diseases is 
developed by [11]. After processing the images it is fed to four classifiers and the results are computed 
accordingly. Decision tree algorithm showed the best performance with an accuracy of 97.91% when 
compared to the other algorithms. Madhavan et al. [12] have used matlab for preprocessing the pomegranate 
leaf images and classified using SVM algorithm. Chen ef al. [13] proposes a modification to the existing 
transfer learning algorithm. They have combined the MobileNet and Squeeze and excitation block which 
forms a new network. They also performed transfer learning separately to obtain the expected result. 
Zhao et al. [14] discusses the difficulty in classifying unbalanced image datasets. In order to solve this they 
have used DoubleGAN network. It is also compared with many other networks. Hua et al. [15] proposes a 
novel approach to identify the diseases affecting the agricultural crops using a multi feature fusion algorithm 
known as pest detecting region-based CNN (PD R-CNN). Experimental results show that this technique 
performs more accurately than other algorithms. A detection technique to identify the three commonly 
occurring diseases: leaf smut, brown spot and bacterial leaf blight that affects the rice leaf is proposed in 
paper [16]. They have followed the hue and saturation technique for preprocessing and extreme gradient 
boosting decision tree ensemble for classification. 

A histogram equalization technique based on the fuzzy logic is recommended to improve the 
contrast of the input images [17]. This method is assessed using the metrices namely, mean square error 
(MSE) and peak signal to noise ratio (PSNR). By taking average value of initial image and fuzzy logic of 
histogram is divided into two subparts, and later they are equalized to preserve the brightness of image. 
Archana and Sahayadhas [18] discusses a comparison study based on the image quality by considering the 
four types of filtering techniques: Gaussian filter, median filter, mean filter and Weiner filter with the help of 
a common data set. And it is observed that Weiner filter has the better PSNR and signal to noise ratio (SNR) 
values among all the filtering processes. Temiatse et al. [19] utilizes the conventional histogram equalization 
method to enhance and improve the lemon grass images. Mat lab is used to calculate the efficiency of the 
system. From the simulation it was comprehensible that even though Histogram Equalization is a traditional 
approach, it has the capacity to effectively enhance the images and bring out the hidden details present in 
each image. 

Sudeep and Pal [20] discusses the importance of preprocessing the image data for a classification 
algorithm using CNN. Here they have used CIFAR10 Dataset. Their accuracy is higher for zero component 
analysis (ZCA) when tested with both mean normalization and standardization techniques. Vishnoi et al. [21] 
reviews a variety of feature extraction techniques used in the development of automatic plant disease 
identification and classification models. The comparative study shows that many features together give more 
accurate values than features taken as single types. Chethan ef al. [22] uses the advanced histogram 
equalization technique to preprocess the images and then using k-means clustering the images are segmented. 
Features are extracted with the help of a grey-level co-occurrence matrix. Later, SVM and CNN were used to 
classify the diseases based on the features extracted, and the results were compared. As a part of 
preprocessing, here researchers have resized and then enhanced the contrast of the images using a histogram 
equalization technique known as contrast limited adaptive histogram equalization (CLAHE) [23]. Also in the 
final algorithm they have combined five CNN architectures to produce the output. The result obtained stated 
that ensemble model gave more accuracy when compared to individual architectures. For the computer vision 
models to achieve better results in low resolution and comparatively poor contrast [24], Sambasivam and 
Opiyo [25] have included CLAHE algorithm in their work. Kaur [26] proposed a preprocessing technique 
BPDFHE that enhances the contrast of input image data. A comparative evaluation is done with the existing 
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preprocessing techniques such as CLAHE and contrast stretching. The quality of the image data is calculated 
using the PSNR, structural similarity index(SSIM), MSE metrics and the output obtained shows an 
improvement with the proposed algorithm. Sheet et al. [27] suggested a novel modification to the existing 
preprocessing technique BPDHE so as to surmount the limitations of the system. The advanced method 
BPDFHE, BPDFHEuses fuzzy domain to represent and process digital image. Experimental results show that 
BPDFHE gives more accurate results when compared to that of BPDHE. The computational time for each 
method are also discussed in this paper. Kuber et al. [28] compares the conventional histogram equalization 
techniques such as CLAHE, uniform histogram equalization (UHE), global histogram equalization (GHE), and 
brightness preserving dynamic histogram equalization (BPDHE) with BPDFHE. The result shows that 
BPDFHE can proficiently conserve mean image brightness and provide better PSNR values than the other 
histogram models. 


3. PROPOSED METHODOLOGY 

Preprocessing of the input data is a significant stage as it provides noise free, high contrast images 
for accurate segmentation and feature extraction. It also improves the input data by eliminating the unwanted 
elements from the image and intensifies some features that are important in the further stages. There are 
various preprocessing techniques and here we follow BPDFHE, an advanced version of conventional 
histogram equalization along with Gaussian Filtering to preprocess the image. Histogram equalization is a 
commonly used preprocessing technique that uses the image histogram to evaluate the frequency distribution. 
It is similar to a bar graph and this graphical representation shows the areas of the image with low contrast. 
The equalization is done here by taking the frequently occurred pixel intensity values as the threshold and 
spreading out the values based on them. A major drawback of this process is when some intensity values of 
the pixels are inexact. Such inexact or vague gray values cannot be considered in the conventional histogram 
equalization, and this can produce variations in the expected image histogram. In these situations, it is always 
better to go for BPDFHE technique which utilizes fuzzy domain to represent the pixel values. This fuzzy 
domain has the ability to manage inexactness or vagueness of the gray level values. It also shows good 
contrast enhancement capabilities with reduced computational complexity. It also performs well when 
compared with the other histogram equalization techniques. Figure 2 is the pictorial representation of the 
preprocessing stage performed in this work. 


Contrast Noise 
Enhancement Removal 


BPDFHE 


Input Image > + =, Processed Image 
Gaussian Filtering 


Preprocessing input Image 


Figure 2. Block diagram of the proposed model 


3.1. Image dataset 

Here a plant village dataset of 38 different classes of plant varieties is taken and we selected 10 
classes of crops (tomato, bell pepper, potato, cucumber, bitter gourd, brinjal, pumpkin, peas, paprika, cluster 
beans) as the input dataset from the entire set. All together there are 54,303 images of diseased and healthy 
leaves present in plant village dataset and we took 10,505 images of diseased and healthy images leaf images 
corresponding to the 10 classes of crops which we have chosen. It is an international image dataset mainly 
used for the detection of plant diseases using machine learning algorithms and these open access repositories 
of images are provided by the Kaggle website. There are various preprocessing techniques available and can 
be followed as per our need. In this paper we have taken the advanced version of histogram equalization, 
BPDFHE to preprocess input data. Histogram equalization is an approach to intensify the contrast of an 
image. This enhancement of the image equalizes the histogram of the resultant image and changes the overall 
shape of the histogram. BPDFHE technique is a modified alteration to the standard method where there is no 
repetition of the peaks from input to output histogram while performing the mapping action. The main 
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advantages are the low computation time and better contrast improvement. Gaussian blur is another 
preprocessing technique followed in this paper. 


3.2. Histogram equalization 
Histogram Equalization is one of the oldest and frequently used preprocessing techniques. This 
technique is mainly used for improving brightness of input image. As we have to improve the image quality 
for the further processes, enhancing the contrast of the input image is an essential step in preprocessing. 
i. Import the required libraries 
ii. Read the input image 
iii. Generate the histogram for the input image 
iv. Calculate the probability mass function (PMF) either using a histogram or a matrix 
v. Evaluate the cumulative distributive function (CDF) by taking the cumulative sums of all the values 
which are calculated using PMF 
vi. A new set of grey level values are obtained which can be mapped onto the histogram to produce an 
equalized image 


3.3. Brightness preserving dynamic fuzzy histogram equalization (BPDFHE) 

It is an advanced form of histogram equalization that has better contrast enhancement capabilities 
with reduced computational complexity. Histogram equalization is a commonly used preprocessing technique 
that uses the image histogram to evaluate the frequency distribution. It is similar to a bar graph and this 
graphical representation shows the areas of the image with low contrast. The equalization is done here by 
taking the frequently occurred pixel intensity values as the threshold and spreading out the values based on 
them. A major drawback of this process is when some intensity values of the pixels are inexact. Such inexact 
or vague gray values cannot be considered in the conventional histogram equalization, and this can produce 
variations in the expected image histogram. In BPDFHE technique the image values are represented in fuzzy 
domain which helps the histogram equalization approach to handle vagueness of gray level of the pixel 
values in a finer way so that it can provide good performance. The following are the functioning stages in 
BPDFHE technique: i) fuzzy histogram calculation, ii) splitting up of histogram, iii) dynamic histogram 
equalization (DHE) of partitions, and iv) normalization of image brightness. 


3.3.1. Fuzzy histogram calculation 

In BPDFHE, the inexactness of the gray level values of the pixels are handled using the fuzzy 
domain and by considering all the values including the vague ones, a smooth histogram is produced. The 
fuzzy histogram which is a sequence of real numbers is denoted using h(i), whereie {0,1, ...,L — 1} and A(@) 
represents the frequency of the gray values. Let’s consider,/ (x,y) the gray value as a fuzzy number I(x, y). 
So here the fuzzy histogram calculated will be in the form: 


Ai) — AD) + Xx Ly Urry k € [a,b] (1) 


where, /4;~(x,y)i18 triangular fuzzy membership function and it is defined as, 
IL@.y)—il 
Hi~(ey)i = Max (0,1 — =) (2) 


here, [a, b] is defined as the membership function support. 


3.3.2. Splitting up of histogram 

The process of partitioning histogram is known as DHE. This is done by calculating two consecutive 
local maxima. Local maxima are nothing but the brightest pixels. After getting the two consecutive local 
maxima the partitioning is done at its valley. That is, dividing the original histogram into two sub or 
secondary histograms takes place in the valley of two consecutive local maxima. Local maxima detection: 
central differential operator to calculate the discrete operator. 


toy _ ahi) , ACi+1)-hGi-1) 
1O-S. 3) 
The second order derivative can be calculated as, 
a 2hi 
h(i) = — “h(i +1) —2h(i) + ACGi—1) (4) 
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Among the neighboring maxima pair, this is of the highest count. It is calculated to eliminate the issue of 
ambiguity. If there are (n+1) local maxima intensity levels, then it is indicated as {mo, mi,...., Mn}. (n+1) 
sub- histograms will be generated after the partition if the fuzzy histogram is in range of [Imin, Imax] and the 
expansion range will be {[Imin,mo], [mot+1,mi],.....[n+1, Imax] }. 


3.3.3. Dynamic histogram equalization of partitions 

Here we are equalizing the sub histograms which we received from the previous steps. A spanning 
is done to equalize each sub histograms and here the spanning is based on the total number of pixels present 
in the partition. Usually, this process involves two operations. Firstly, we have to plot or map the partitions to 
a dynamic range and then we have to equalize the histogram. Dynamic equalization process consists of many 
parameters and is provided by the given equations: 


span; = high,- low; (6) 
where the high, and low, are the highest and lowest intensities from the i” input sub histogram, respectively. 


factor = span; x logioM; (7) 


Here, Miis the total pixel number and span; is nothing but dynamic range of input sub histogram. If we have 
range; instead of spani, then it is given as, 


_ (L-1)xfactor; 


PANGE = Sart Factor, (8) 
Now the i" output sub histogram can be calculated as, 

start; = Vili range, +1 (9) 
and, 

stop; = Vr-1 range, (10) 


exceptions or anomalies are close by at the two extremities where [start,, stop,] = [0,range,] and 
[startn41,StOpns1] = [Vettrange,,L —1]. In order to equalize each sub histogram, it is essential to 


obtain remapped values. These values are obtained in (11) for the i® sub histogram. It can be calculated as, 


A(k) 


y() = start; + range; Vi-seart; (1) 


here y(j) is new intensity level and Mj is the total number of pixels. 


3.3.4. Normalization of image brightness 

Mean brightness obtained after the DHE of each sub histogram have slight difference from that of 
the input image. The output image is normalized to overcome this issue. Here g represents the output of the 
BPDFHE technique, and then grey level value at pixel location (x, y) for image g is given by, 


99) = TF GY) (12) 


4. RESULTS ANDDISCUSSION 

The figures and tables are the results after applying the preprocessing techniques. Here we have 
taken conventional histogram equalization technique and its advanced version BPDFHE. Both techniques are 
applied to the input images and compared using the image quantity evaluating parameters such as PSNR, 
SSIM, and MSE. Figure 3 is the image after applying histogram equalization. The proposed preprocessing 
technique here is BPDFHE and Gaussian filtering combined. Figure 4 show the image after performing 
Gaussian blur. The combined result is shown in Figure 5. The quality assessment computation is done using 
the three metrics: PSNR, SSIM, and MSE. The values are given in the form of a tabular column Table 1. 
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Figure 3. Histogram equalized image and its corresponding graph 
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Figure 4. Gaussian blurred image 
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Figure 5. Image after preprocessing 


Table 1. Quality evaluation done for leaf image 1 and 2 
Image ID Quality metrics _ Histogram equalized image BPDFHE ilmage Existing system Proposed system 


Image 1 PSNR 27.96 27.9979 26.04 31.509 
Image 1 SSIM 0.756 0.9219 0.9652 0.978 
Image | MSE 231.16 230.90 239.87 304.21 
Image 2 PSNR 27.92 27.80 26.04 34.10 
Image 2 SSIM 0.2356 0.2665 0.9652 0.9653 
Image 2 MSE 187.02 204.04 239.87 134.90 
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5. CONCLUSION 

Pre-processing is an obligatory procedure to improve the input image data for further evaluation and 
development to achieve the appropriate results. The cropping or resizing of the image is a significant 
technique to eliminate the unwanted elements in order to reduce both the memory space and computation 
time. Various filtering methods are used as a part of preprocessing for the removal of noises and other 
distortions. Application of the enhancement techniques are purely based on the contrast of the images. 
Images that are preprocessed and ready give more accurate results. In this work we have used BPDFHE and 
Gaussian filtering to preprocess the images. Experimental outcomes obtained prove that the proposed system 
provides more accurate values than the existing system. 
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